A Novel Method of Text Clustering for Chinese Spam Based on Semantic Body

نویسندگان

  • ZHANG Qiu-yu
  • WANG Peng
چکیده

The effect of spam filtering method based on statistics is not good in filtering the new-type spam with synonymous substitution and camouflage. So a new text clustering method based on Semantic Body for filtering Chinese spam is proposed. In this paper, the word sense disambiguation, lexical chain based on HowNet and statistic-based TFIDF are adopted to extract features of mails. The Semantic Body is obtained from the process. The text clustering based on semantic distance is utilized to dealing with Semantic Body. The experimental results under CCERT Chinese-rules.cf show that the proposed approach has a good performance for new type Chinese text spam in filtering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Clustering based on Semantic Body and its Application in Chinese Spam Filtering

E-mail’s text is the main body of an E-mail. Its content is reflected by semantic body formed by a large number of semantic elements, so it is the most authoritative and effective to study semantic body information of spam when analyzing its text. Firstly, this paper takes the advantage of HowNet in analysis of semantic element and analyze semantic bodies in email text, then proposes the method...

متن کامل

Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering

The effect of spam filtering method based on statistics is not good enough in filtering the new-type spam with synonymous substitution and camouflage, because the method based on statistics ignores the semantic relation between words in the text, and only judges from the word itself. So, a method of spam filtering based on the semantic body is proposed in this paper. The method adopts lexical c...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Wavelet Packet Transform-Based Algorithm for Mixing Matrix Estimation

REGULAR PAPERS Wavelet Packet Transform-Based Algorithm for Mixing Matrix Estimation Yujie Zhang, Huiming Peng, and Hongwei Li Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering Qiu-yu Zhang, Peng Wang, and Hui-juan Yang Uncertainty Time Series' Multi-Scale Fractional-Order Association Model Yuran Liu, Mingliang Hou, and Yanglie Fu Evaluation of OpenID-Based Doubl...

متن کامل

A Novel Method of Spam Mail Detection using Text Based Clustering Approach

A novel method of efficient spam mail classification using clustering techniques is presented in this research paper. E-mail spam is one of the major problems of the today’s internet, bringing financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is an important and popular one. A new spam detection technique using the text clusterin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012